Discovering phrases in machine translation by simulated annealing
نویسندگان
چکیده
In this paper, we propose a new phrase-based translation model based on inter-lingual triggers. The originality of our method is double. First we identify common source. Then we use inter-lingual triggers in order to retrieve their translat ions. Furthermore, we consider the way of extracting phrase translations as an optimization issue. For that we use simulated annealing algorithm to find out the best phrase translations among all those determined by inter-lingual triggers. The best phrases are those which improve the translation quality in terms of Bleu score. Tests are achieved on the proceedings of the European Parliament corpora. The training is made on a corpus containing 596K parallel sentences (French-English) and tests on a corpus of 1444 sentences. With only 8.1% of the identified source phrases occurring in the test corpus, our system overcomes the baseline model by almost 3 points.
منابع مشابه
Phrase-Based Machine Translation based on Simulated Annealing
In this paper, we propose a new phrase-based translation model based on inter-lingual triggers. The originality of our method is double. First we identify common source phrases. Then we use inter-lingual triggers in order to retrieve their translations. Furthermore, we consider the way of extracting phrase translations as an optimization issue. For that we use simulated annealing algorithm to f...
متن کاملMachine Translation Using Overlapping Alignments and SampleRank
We present a conditional-random-field approach to discriminatively-trained phrasebased machine translation in which training and decoding are both cast in a sampling framework and are implemented uniformly in a new probabilistic programming language for factor graphs. In traditional phrase-based translation, decoding infers both a "Viterbi" alignment and the target sentence. In contrast, in our...
متن کاملA simulated annealing algorithm to determine a group layout and production plan in a dynamic cellular manufacturing system
In this paper, a mixed-integer linearized programming (MINLP) model is presented to design a group layout (GL) of a cellular manufacturing system (CMS) in a dynamic environment with considering production planning (PP) decisions. This model incorporates with an extensive coverage of important manufacturing features used in the design of CMSs. There are also some features that make the presented...
متن کاملHybrid artificial immune system and simulated annealing algorithms for solving hybrid JIT flow shop with parallel batches and machine eligibility
This research deals with a hybrid flow shop scheduling problem with parallel batching, machine eligibility, unrelated parallel machine, and different release dates to minimize the sum of the total weighted earliness and tardiness (ET) penalties. In parallel batching situation, it is supposed that number of machine in some stages are able to perform a certain number of jobs simultaneously. First...
متن کاملMultimodal Comparable Corpora as Resources for Extracting Parallel Data: Parallel Phrases Extraction
Discovering parallel data in comparable corpora is a promising approach for overcoming the lack of parallel texts in statistical machine translation and other NLP applications. In this paper we propose an alternative to comparable corpora of texts as resources for extracting parallel data: a multimodal comparable corpus of audio and texts. We present a novel method to detect parallel phrases fr...
متن کامل